Skip to content

ci(release): gate python wheels on e2e for tagged releases#319

Merged
drew merged 14 commits intomainfrom
more-ci-updates-2
Mar 15, 2026
Merged

ci(release): gate python wheels on e2e for tagged releases#319
drew merged 14 commits intomainfrom
more-ci-updates-2

Conversation

@drew
Copy link
Collaborator

@drew drew commented Mar 15, 2026

Summary

Ensure all tagged release artifacts are gated by e2e tests, and remove unnecessary e2e gating from dev release Docker images.

Changes

  • release-tag.yml: Add e2e to publish-python needs so Python wheels are not published to Artifactory until e2e passes
  • release-dev.yml: Remove e2e from tag-ghcr-dev needs since dev Docker images don't need to wait for e2e
  • policy-advisor example: Replace gitlab-master.nvidia.com references with internal.corp.example.com

Updated gate table

Artifact Tagged Release Dev Release
Docker images (GHCR) Gated by e2e Not gated by e2e
Python wheels (S3/Artifactory) Gated by e2e Not gated by e2e
GitHub Release (CLI + wheels) Gated by e2e (transitively) Not gated by e2e

Testing

  • mise run pre-commit passes (YAML-only changes, no code)
  • Unit tests added/updated
  • E2E tests added/updated (if applicable)

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable)

drew added 2 commits March 14, 2026 22:58
- Add e2e to publish-python needs in release-tag.yml so wheels are not
  published to Artifactory until e2e passes
- Remove e2e gate from tag-ghcr-dev in release-dev.yml since dev Docker
  images do not need to wait for e2e
- Replace gitlab-master.nvidia.com references with generic example host
  in policy-advisor CTF example
@drew drew self-assigned this Mar 15, 2026
drew added 12 commits March 14, 2026 23:25
…mand test

Switch the release canary from Docker-outside-of-Docker (host socket
mount) to true Docker-in-Docker. The CI container now starts its own
dockerd, so the gateway cluster container is a child process and
127.0.0.1 port bindings are reachable directly.

This enables testing the real zero-to-sandbox user path: a single
`openshell sandbox create` that auto-bootstraps the gateway, pulls the
cluster image, and creates a sandbox — no --gateway-host workaround.

Dockerfile.ci changes:
- Add iptables (required by dockerd for container networking)
- Extract full Docker daemon suite (dockerd, containerd, runc) instead
  of CLI only

release-canary.yml changes:
- Remove /var/run/docker.sock volume mount
- Add dockerd startup step
- Remove gateway host resolution and explicit gateway start steps
- Simplify canary to single auto-bootstrap sandbox create command
The first canary run revealed two issues:

1. dockerd failed to start because docker-proxy was not extracted from
   the Docker static binary tarball. Add it to the extraction list.

2. The GitHub Actions runner injects its own Docker socket into job
   containers. Without an explicit DOCKER_HOST, the openshell CLI
   connected to the runner's host Docker daemon instead of our DinD
   daemon. Start dockerd on a dedicated socket (/var/run/dind.sock)
   and export DOCKER_HOST so all subsequent steps use it.
Using a custom socket path and DOCKER_HOST breaks the GitHub Actions
runner's internal Docker operations (it uses docker exec to run steps
inside the container). Since we removed the host socket volume mount,
/var/run/docker.sock is free inside the container — just start dockerd
on the default path with no DOCKER_HOST override needed.
The GHA runner injects its own /var/run/docker.sock into the container
for management, so dockerd can't bind to the default path. Use a
dedicated socket (/var/run/dind.sock) and set DOCKER_HOST only on
steps that need it (via step-level env) to avoid breaking the runner.
Each GHA step runs via docker exec which sends SIGHUP to backgrounded
processes when the shell exits. Use nohup to detach dockerd from the
step's process group so it persists across steps.
setsid creates a new session and process group, ensuring dockerd
survives when the GHA runner's docker-exec shell exits between steps.
Background processes started via docker-exec don't persist across GHA
steps — each step gets a fresh docker-exec invocation. Move dockerd
startup into the canary test step itself so it shares the same shell
session and stays alive for the duration of the test.
The GHA container uses overlayfs, and the inner dockerd also defaults
to overlayfs. Overlay can't be stacked, causing container creation
to fail. Use --storage-driver=vfs which copies layers instead of
layering them — slower but reliable for DinD.
Add OPENSHELL_GATEWAY_HOST environment variable support to the sandbox
create auto-bootstrap path. This mirrors the --gateway-host flag on
`gateway start` but works for the implicit bootstrap triggered by
`sandbox create` when no gateway exists.

In CI containers using Docker-outside-of-Docker (host socket mount),
127.0.0.1 inside the CI container doesn't reach sibling gateway
containers. Setting OPENSHELL_GATEWAY_HOST=host.docker.internal fixes
this without requiring the two-step gateway-start-then-sandbox-create
workflow.

Update release canary to use the single-command path: just
`openshell sandbox create` which auto-bootstraps everything. For
workflow_dispatch (branch testing), builds CLI from source to test the
current branch code. For workflow_run (release testing), installs the
published binary.
Use the explicit --gateway-host flag on gateway start (works with
current published CLI) while also setting OPENSHELL_GATEWAY_HOST env
var (will be picked up once the next release ships with env var
support). Once the env var support is released, the canary can switch
to the single-command sandbox create path.
The canary uses DooD (host socket mount), not DinD, so the dockerd,
containerd, runc, docker-proxy, and iptables additions are unnecessary.
The gateway host override is useful in any environment where the
client can't reach the Docker host at 127.0.0.1 — CI containers,
WSL, remote Docker hosts, etc. Update the CLI help text, DeployOptions
doc comment, and bootstrap env var comment to reflect this.
@drew drew merged commit aa2e271 into main Mar 15, 2026
9 checks passed
@drew drew deleted the more-ci-updates-2 branch March 15, 2026 07:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant